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Abstract 

Motivation: The rapid growth in genome-wide association studies (GWAS) in plants and animals 
has brought about the need for a central resource that facilitates i) performing GWAS, ii) accessing 
data and results of other GWAS, and iii) enabling all users regardless of their background to 
exploit the latest statistical techniques without having to manage complex software and computing 
resources. 

Results: We present easyGWAS, a web platform that provides methods, tools and dynamic vi- 
sualizations to perform and analyze GWAS. In addition, easy GWAS makes it simple to reproduce 
results of others, validate findings, and access larger sample sizes through merging of public datasets. 
Availability: Detailed method and data descriptions as well as tutorials are available in the 
supplementary materials. easyGWAS is available at http://easygwas.tuebingen.mpg.de/. 
Contact: dominik.grimm@tuebingen.mpg.de 

1 Motivation 

Genome-wide association studies (GWAS) are an integral tool for discovering the polygenic archi- 
tecture underlying many complex traits. The recent steady growth of GWAS applications across 
species (e.g. Arabidopsis thaliana [1, 2, 3], Drosophila melanogaster [4]) has generated a wealth of 
genotypic and phenotypic data, which makes it possible to search for significant association signals 
for multiple traits in one species or related traits in different species. Further analysing the shared 
associations across traits may provide invaluable insights for genetics and evolutionary biology: 
First, comparing results of GWAS in different species may enable statistical validation of associa- 
tion signals, for instance, to further support findings in human genetics by replicating compatible 
results in model organisms. Second, one can examine the hypothesis that genetic factors influencing 
phenotypes can be traced back to related genes and mutations across species. By discovering these 
common genetic origins of phenotypes, we may gain a deeper understanding of the adaptation of 
species and possibly of the convergent or parallel evolution of complex traits. 

1.1 Difficulties in performing GWAS across traits and species 

Obtaining genome-wide association mapping results across multiple studies, traits or even species, 
however, is still a cumbersome enterprise, which is complicated by three problems: First, several 
software packages (e.g. PLINK [5]) or species-specific websites (e.g. DGRP [4], Matapax [6], Emma- 
Server [7]) allow to perform genome- wide association studies on a given dataset. However, they 
either do not provide genotype and phenotype data at all or only for one single species. Second, 
existing databases for GWAS results (e.g. GWAS Catalog [8]) focus primarily on human genetics 
and present summary statistics only for the top scoring loci. This is despite the fact that the 
most significant genetic loci alone often explain only a small fraction of the heritability of complex 
traits [9]. Third, databases for human genotypic and phenotypic data such as NCBI dbGaP [10] 
require a formal application and evaluation before data access can be granted. Data from GWAS 
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in model organisms and crops is more easily accessible in principle, but can only be obtained from 
individual websites in a variety of data formats. If one wants to run association studies with 
identical parameter settings on these datasets, one has to perform tedious data preprocessing and 
data integration steps first. 

1.2 Role of easy G WAS 

The field is missing a platform that allows for easy and open access to published genotype and phe- 
notype data from model organisms and crops and is able to perform GWAS on different traits and 
species. Here, we announce the release of easyGWAS, an interactive and easy-to-use online platform, 
whose purpose is to fill this gap. easyGWAS is available at http:/ /easygwas. tuebingen.mpg.de/ and 
enables users to perform GWAS online on their own private or publically available data and to store 
and publish phenotypic data, meta information on the samples and GWAS results. 

1.3 Functionality 

Genotypic and phenotypic data from a continuously growing set of published GWA studies are 
prestored in easyGWAS. Users can either work with these public phenotypes or upload their own 
phenotypes. Phenotype data uploaded by the user can either be kept private for primary analyses, 
shared with a restricted set of collaborators, or made publicly available to the community such that 
other researchers can reuse them in their GWAS analyses. easyGWAS allows to perform univariate 
association tests in an interactive manner, without the need to manage any computing resources or 
software. Depending on whether the phenotype is binary or continuous, easyGWAS offers suitable 
types of mapping algorithms to the user. The results of a completed GWAS can then be visualized 
in Manhattan plots with gene annotations for the top scoring signals. The example in section 3 and 
the Supplementary Material includes detailed instructions on how to perform GWAS in easyGWAS. 

1.4 Inter-compatibility with statistical genetics software packages 

If the user wants to perform an analysis, which is currently only available in existing statisti- 
cal genetics software packages but not in easyGWAS , the user can export easyGWAS data and 
store them locally in a file format readable for these software packages. Data export to PLINK, 
comma-separated files (CSV) and hierarchical data format (HDF5) file format is already available 
in easyGWAS . 

2 Example 

In this example of usage, we demonstrate how to perform a GWA study in the plant model organism 
Arabidopsis thaliana. We use an already published phenotype FLC, which is related to the flowering 
time of the plant. For the FLC phenotype, RNA was extracted from leaves after four weeks of 
growth and gene expression levels were determined by northern hybridization quantified relative to 
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beta-tubulin expression. 

Creating a new GWA experiment is divided into several intuitive steps. To be able to follow our 
instructions, the user should first navigate to the easy G WAS experiment wizard by clicking on 
GWAS Center and then on Create new GWAS. 

1. First, the user selects a species and a dataset. In our example, we select the species Ara- 
bidopsis thaliana and the dataset AtPolyDB (call method 75, Horton et al.) and then click 
on Continue. 

2. In the second step, the user has to select a phenotype. Here the user has the choice to select a 
published, private/shared or public phenotype. Published phenotypes are accompanied 
by a peer-reviewed research article, public phenotypes were made public by another user, 
but need not originate from a publication. Additionally, the user can upload own data (see 
Supplementary Materials or the online FAQ for detailed tutorials). Here, we select an already 
published phenotype FLC [3]. For this purpose, we select the tab 2.1 Select an existing 
published phenotype and type into the input field the name of the phenotype FLC. Auto- 
completion will support the user to select the correct phenotype. We proceed by clicking on 
Continue. 

3. In this step, users can add additional factors such as principle components or covariates (e.g. 
environmental factors, gender specific characteristics). In our example, we do not add any 
additional factors. We click Continue in the tab 3.1 No additional factors. 

4. Now the user has to select genotypic data. Here, all provided SNPs, specific chromosomes or 
a region of SNPs can be selected. We select chromosome 1 and 5 in the tab 4.2 Select one 
or several chromosomes for Arabidopsis thaliana by checking the boxes and click on 
Select chromosomes. 

5. To perform a GWAS, we have to select the association method we intend to use. In the algo- 
rithms view, one has options to also apply different transformations or filter to the data, such 
as normalizing the phenotypes. The selection of methods and transformations is dependent 
on the chosen data. Our web application is analyzing the data on the fly and is enabling only 
those options that are applicable for the chosen data. Here we keep the default settings using 
a Linear Regression without any transformations. Then we click on Continue. 

6. In the last step, users can check all inputs and can make adjustments if necessary. If everything 
is correct, the experiment can be submitted to the computation servers. For this purpose, we 
simply click on Submit Experiment. 

Finally, the experiment is submitted and all computations are performed in the background. The 
current view refreshes every 3 seconds. In the meanwhile, users can submit new experiments or 
browse the data. Nevertheless, this example is finished in around 60 seconds and you will get 
automatically redirect to the result view to analyze the results. In the result view we provide 
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dynamic Manhattan plots. Every single SNP can be explored in more detail by moving the mouse 
over a single point in the plot. On the left we provide a list with the top 10 SNPs and in which 
gene they are located. In our example the user can see at a glance, that for example the top three 
SNPs are located in chromosome 5. Additionally, we provide a more detailed SNP annotation view, 
quantile-quantile plots (QQ-plots) and a phenotype specific view with details about the phenotype 
(see Supplementary Materials for a detailed description and screenshots) . Summary statistics can 
be downloaded in various formats for further analysis with third party tools. 

Additional detailed tutorials (supported by screenshots) about uploading, sharing and downloading 
data are included in the Supplementary Materials. 

3 Conclusion and future plans 

easy G WAS is designed to be a dynamically evolving platform with a growing number of functions 
and prestored datasets. As of now, easy G WAS offers published genotypic and phenotypic data 
for Arabidopsis thaliana [1, 2, 3] and Drosophila melanogaster [4] and users can upload their own 
phenotypic data, easy G WAS enables single-locus mapping with population structure correction for 
a single trait at a time. 

In future versions of easy G WAS , we plan to extend the list of species and to allow users to 
upload their own genotypic data, while retaining data quality and reliability. Further methods 
for multi-locus and multi-trait mapping and for automatically retrieving shared association signals 
across traits will be included. 

In summary, we believe that easy G WAS will foster new types of genetic analyses, by providing 
a convenient framework, which includes data and algorithms for obtaining GWAS results across 
traits, studies and species. 
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A Data: Genotypic & phenotypic data and meta information 



A.l Available published data 

To easily perform genome- wide association studies (GWAS) across different species a variety of pub- 
lished genotypes and phenotypes are pre-stored in the easyGWAS database. As of November 2012, 
data for Arabidopsis thaliana and Drosophila melanogaster are available in our public database. For 
Arabidopsis thaliana we included different data sources. The first dataset ['AtPolyDB (call method 
75, Horton et a/.)'] includes 1,307 samples presented by Horton et al. in 2012 [1]. Furthermore, 
we included 107 phenotypes, described and analyzed by Atwell et al [3]. These 107 phenotypes 
are measured for a subset of these 1,307 samples. The second dataset ['80 genomes data (Cao et 
al.)'] includes 80 samples from the first phase of the 1001 genomes project in Arabidopsis thaliana 
[2]. The genome matrix from the 1001 genomes website 1 was used to retrieve all single nucleotide 
polymorphisms (SNPs). For this purpose, we excluded all positions with incomplete information 
and kept all positions with at least one consecutive nucleotide. All SNPs in these Arabidopsis 
thaliana datasets are homozygous ones. Each allele in the SNPs is encoded as described in Table 
1. 





major allele 


minor allele 


major allele 





1 


minor allele 


1 


2 



Table 1: SNP encoding 

For the species Drosophila melanogaster we integrated a dataset ['Drosophila Genetic Reference 
Panel (DGRP, Mackay et a/.)'] with 172 samples [4], sequenced and analyzed by Mackay et al, 
as well as three phenotypes [4, 11, 12, 13] (six phenotypes, after splitting those into male and 
female). Missing SNPs in the Drosophila melanogaster genome were imputed using the majority 
allele (different modes of imputation are currently being included into easyGWAS and will be 
available soon). 

Additionally we integrated gene annotations for all organisms. This information is used to identify 
if a SNP is located within a gene or not. 

Publicly available genotypes and phenotypes are accompanied by additional meta information such 
as growth conditions in Arabidopsis thaliana or wolbachia status in Drosophila melanogaster. All 
datasets were downloaded from their official websites (Table 2). 

1 www. 1 00 lgenomes.org 



7 



AtPolvDB frail method 75 Horton et al ) 


Genotypes 
Phenotypes 


https: / / cynin.gmi.oeaw.ac.at /home / resources / atpolydb 

htt.DS' / /rvnin prni oeaw ar at /home /rpsonrrps /atnol vdl~> 

http: / / arabidopsis.gmi.oeaw.ac.at:5000 /Display Results / 


80 genomes data (Cao et al.) 


Genotypes 


http://1001genomes.org/data/MPI/MPICao2010/releases/ 


Arabidopsis thaliana annotations 


Annotations 


http:/ /www. arabidopsis. org/ 


Drosophila Genetic Reference 
Panel (DGRP, Mackay et al.) 


Genotypes 
Phenotypes 


http: / / dgrp.gnets.ncsu.edu/freezel /Illumina 

_+_454_SNP_genotypes_nltered_for_GWAS/ 

htt p : / / dgrp . gnet s . ncsu . edu/ freeze 1 / P henoty pes / 


Drosophila melanogaster annotations 


Annotations 


ftp: / / ftp.flybase.net /releases /FB2008_10 /dmel_r5 .13 /gff/ 



Table 2: Data sources for all integrated organisms 



A. 2 How to upload new data 

A. 2.1 Phenotypic data and meta information 

Registered users can upload private data such as phenotypes, covariates or meta information using 
the easy G WAS wizard (see Tutorial 5.2). This data can be used to perform new GWAS or can be 
shared with collaborators and colleagues. Furthermore, data can made be public to the scientific 
community We distinguish between published and public data. Published data is integrated by 
the easy GWAS team using data from peer-reviewed publications. However, public data was made 
public by any easy GWAS user. We also provide a contact form for authors who would like to have 
their published phenotype data integrated into easy GWAS through the easy GWAS team, rather 
than uploading their data themselves. 

A. 2. 2 New genotypic data 

Until now, we provide different datasets for two species. We plan to include datasets from different 
species to provide a richer selection of genotypic and phenotypic data. To retain quality, we provide 
an application form which users can use to send a formal data submission application to us. We 
then will evaluate the request. After successful evaluation, we will provide an upload link and 
after successful upload and quality inspection of the data, our team will include the data into our 
database (Figure 1). 

A future extension will be a private upload option for small genotypic data sets. 

B Integrated methods to perform genome-wide association stud- 
ies 

In general, performing a genome- wide association study is not trivial. There are three main aspects 
that have to be considered. The first category is data preprocessing. The scientist has to know 
how to encode, normalize and filter the genotypes, phenotypes and covariates. Second, the scientist 
has to know which method can be used for which kind of data. There are binary and continuous 
phenotypes as well as homozygous and heterozygous genotypes. Only specific methods can be 
applied to a specific type of data. Third, it is crucial to decide whether one should correct for 
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Steps to submit new Genotype data 



New Genotype Data 



Individual 1 atgcatgcatgcatcaccatgcatgctagctacg 

Individual 2 atgcaggcatgcatccccatgcatgctagcgacg 

Individual 3 atgcatgcatgcatcaccatgcatgctagcgacg 

Individual n atgcatgcatgcatcaccatgcatgctagcgacg 



1. Apply for submission 



Submission form 

a) Describe the genotype data (Species,#SNPs...) 

b) Describe additional phenotypes, covariates or meta- 
information 

c) Submit form 



3. Upload link is provided 



4. We integrate the new data into 
published database 



Figure 1: Application process to submit new genotypic data 

population stratification or latent confounding factors. Further, some of these methods are hard 
to parameterize or complicated to set up. One of the strengths of easy G WAS is that it provides 
several implemented methods and data transformations out of the box. This helps the user to 
easily perform a genome-wide association study. 

B.l Methods to perform a GWAS 

The initial version provides several univariate algorithms, such as linear regression, linear mixed 
models (EMMAX [14], FaSTLMM [15]) and the Wilcoxon rank-sum test. Linear regression can 
be used to find single associations between a single SNP and a phenotype. Linear mixed models 
are used to correct for population structure, family structure and cryptic relatedness at the same 
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time. To all these methods one can add covariates, such as principle components (PCs), environ- 
mental factors or gender specific characteristics. Additionally, the Wilcoxon rank-sum test can be 
used for homozygous genotypes. These methods are state-of-the-art, more methods will be added 
continuously. New methods for multi-marker discovery such as two-locus search using graphical 
computing units (GPUs) and multi-trait discovery will be added in the near future. 

B.2 Transformations to standardize data 

To transform phenotypic and genotypic data we added several methods. Genotypes can be stan- 
dardized, one can zero-mean the data and/or divide by unit variance. Phenotypes can be trans- 
formed in the same way. Additionally, phenotypes can be log- transformed, square root and box-cox 
transformed. Figure 2 illustrates a scheme of all options. 



Genotype 



Phenotype 




Figure 2: Scheme off all possible transformations and GWA mapping methods 

C The web application interface 

The web application contains three main parts. There is one view to plan, perform and store 
GWAS, a second view to browse and analyze the data and a third view to download available 
datasets. In the following, we will describe all sections in detail. 

C.l The easy GWAS wizard and experiment history 

The first section contains all necessary tools to plan, perform and analyze whole genome-wide 
association studies. Here registered users can use a step-by-step procedure (software wizard) to 
easily create new experiments (see Tutorial 5.1, Figure 3a). The wizard is divided into several 
steps. First the user has to choose an available species and dataset. In the next step a single 
phenotype can be selected. Here it is possible to select an already published, private or public 
phenotype. Additionally, one can upload an own phenotype. We distinguish between published 
and public phenotypes. Published phenotypes are already published, whereas public phenotypes 
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b) 
c) 



CREATE EXPERIMENTS 

it Overview 



EXPERIMENT OVERVIEW 



My temporary history 



GWA Wizard 

1. Species / 2, Phenotypes / 3. Additional Factors / 4. Genotypes / 5. Algorithm / 6. Summary 
1. Select a species and a dataset for your GWA experiment 



C Shared experiments 
6f Public experiments 



Select a species: Arabidopsis thaliana 



Select a dataset: AtPolyDB [call method 75, Horton et al.) 



Figure 3: Screenshot from the GWA experiment view. Creating new GWA experiments and sharing 
the results with collaborators or the scientific community 

are phenotypes uploaded by a specific user and made public to the community. After chosen a 
phenotype it is possible to add additional factors to the experiment, such as principle components 
or one or several covariates. Covariates are meta information such as environmental factors or 
gender specific characteristics. To proceed the user has to select genotypic data. To do so, it is 
possible to select all genotypic data, meaning all available SNPs. Furthermore, specific chromosomes 
or a range of SNPs can be selected. The last step provides different algorithms, standardizations 
and filters. The selection of methods is based on the selected genotypic and phenotypic data in the 
previous steps. The summary view in the end provides all user specific selections and offers the 
user to adjust settings or to submit the experiment to the computation workers. 
Each experiment performed is saved in a temporary experiment history (Figure 3b). Here all 
experiments are stored for primary analysis for at least 48h. To keep interesting findings, users 
can store experiments permanently in their private profile. To simplify scientific exchange, all 
experiments can be shared via the web application with collaborators and colleagues (Figure 3c). 
Sharing experiments and data can prevent laborious extracting of data and findings. Furthermore, 
data and experiments can be made public to the scientific community. All summary statistics can 
be downloaded to further analyze the data using third party tools. 

Performing GWAS can be time consuming. Due to an advanced technology (see Web application 
infrastructure) the user can continue working using the web application while the experiment is 
computed in the background at the same time. Automated email notifications are send out as soon 
the computations are done. Additionally, the user can track the status of all experiments through 
the temporary history (Figure 4). 

To examine individual experiments each experiment has an interactive results page. Figure 5 shows 
a screenshot of the result page. The view is divided into two parts. The left part provides general 
information. Here a short summary table informs about all settings made by the user, e.g. which 
species, dataset and parameters were selected (Figure 5a). At a glance the user can see the top 
10 SNP annotations with the smallest p- values (Figure 5b). Dynamic Manhattan-plots for all 
chromosomes are rendered in the right half (Figure 5c). Each SNP within the Manhattan plot is 
interactive, meaning that the user is able to inspect single SNPs getting live information like the 
corresponding p-value or in which gene the SNP is located. The green line in each Manhattan- 



11 



CREATE EXPERIMENTS 

4t Overview 

+ Create new GWA 

©Tutorial 

EXPERIMENT OVERVIEW 



O My temporary history 



£ My experiments 
CJ Shared experiments 
Sf Public experiments 



Temporary GWA-Experiment history Saved GWA-Experiments Shared GWA-Experiments 



Temporary experiments are available for 4Qh! To store them permanently please save the experiment using the save button. 



Experiment Experiment id 
name 



SNPs Phenotypes Algorithm Running Operations 



Experiment D 299ee52c-1B2e^19b7-b6a5- Nov. 11, 2012, Arabidopsis 
flD699ce7b3e 05:06 PM thatiana 



Leaf serr IS Linear 

Regression 



Figure 4: Screenshot from temporary experiment history. The red highlighted row indicates that 
the experiment is still computing. 
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^Overview 

+ Create new GWA 

©Tutorial 

EXPERI MENT OVERVI EW 



©My temporary history 



S My experiments 
B Shared experiments 
E Pub lie exp eriments 



Brief summary 



Asabidopsis tfrafiana 

AtPolyDB (call method 75. Hortone 

ai) 
SNPs: 

All SNPs selected 
Phenotypes: 

Leaf serr 16 
Additional factors: 

None 
Algorithm: 

Linear Regression 

MAF: 0.0 

Standard. &.: N ° 

standardization 

Standard. P.: N ° 

stand a'diiiatioi- 





Top 10 SNPs with annotations 




Chr 


Position 


Gene 


ChrS 


12264152 


AT5G32613 


Chrl 


375466 


AT1G02060 


ChrS 


17610207 


No gene found 


Chr5 


9734655 


No gene found 


Chrl 


16697660 


AT1G44760 


Chr2 


73S020 


AT2G02650 


ChrS 


607230 


AT5G03350 


Chrl 


27159696 


No gene found 


Chr5 


7056776 


AT5G20B30 


Chr5 


14274546 


AT5G36230 



e) 
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Manhattan Plots QQ-Plots^/ Phenotype Explorer / SNP Annotations *7/T : rt summary 



GWA-Results for experiment with id: 299ee52c-1 62e-49b7-b6aS-f 1 0699ce7b5e 



GWA result options 



Manhattan -plot lor chromosome 1 
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5000000 10000000 15Q0CC00 

chromosomal position |bp] 



Manhattan-plot for chromosomes 



chromosomal position | bp] 



Manhattan-plot for chromosome 4 



chromosomal position | bp] 



Manhattan-plot for chromosomes 



















wa&iMJklsil 



Figure 5: Screenshot showing the result page of an experiment. 

plot is the Bonferroni threshold. The alpha significance level can be adjusted using the plotting 
options (Figure 5d). The strength of population structure confounding can be easily explored with 
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Q-Q plots (Figure 5e) and the genomic control A. To see the actual distribution of a phenotype, 
the Phenotype Explorer shows histograms for transformed and non-transformed phenotypes and 
computes a Shapiro- Wilk test to test the null hypothesis that the data was drawn from a normal 
distribution (Figure 5f). To examine if SNPs of interest are located within genes, significant loci 
are summarized in a gene-annotation view (Figure 5g). 

C.2 The data center 

The second main section of the web application is the Data Center. Here, the user can browse 
available data, such as samples, phenotypes and covariates. Detailed information can be accessed 
for each data entry, such as meta and/or geographical information (Figure 6). Associated publi- 
cations are provided for all published entries. The Data Center contains two main views. One 
for published and public data and a second for all user specific private/shared data. Private 
data is only visible to the owner of the data. Note that privately shared data belongs to the owner, 
meaning that only the owner has the permission to delete or modify shared data. 



DATA SUMMARY 

-FT Summary 

PUBLISHED DATA 

• Published Phenotypes 

PUBLICLY SHARED DATA 

H Public Phenotypes 
©Public Covariates 

QUERY- BUILDER 

+ Build new query 
©Query history 
©Tutorial 

Search by Sample Name/ID 




Summary 




Name: Bar 1 


ID: 9332 


Species: 


A/abklopsis thaliana 




Dataset: AtPofyDB fcaff method 75. Honor? er a/.j 


Country: 


SWL" 




Region: 


N Sweden 




Latitude: 62.B69B 


Longitude: 


1&.3B1 




Source: 


Alison. Anastasio 





Additional meta information 



median intensity 



Publications Comments 



Download/Edit 



First Author Title 



Journal Year Volume 



Matthew. Horton Genome-wide patterns of genetic variation in Nature 2012 44 212- 1 0. 1 Q3&/ng. 1 042 22231464 /" 

etal. worldwide Arabidopsis foafiana accessions from the Genetics 216 X 

RegMap panel 



Figure 6: Data center with detailed information about sample, phenotypes and covariates 
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C.3 The download center 



The third section provides additional download options. Here, whole datasets (genotypic and 
phenotypic data) for all integrated species can be downloaded in different file formats. At the 
moment we provide the following formats: PLINK[5], comma-separated files (CSV) and hierarchical 
data format 5 (HDF5) 2 . 

D The web application backend 

The backend of the web application is completely written in Django 3 a web framework for Python. 
For the web design, we used the Cascading Style Sheets (CSS), provided by Twitter Bootstrap 4 . To 



[Internal server structure (invisible to the user) 

Hybrid database server 



If 



Postgresql 



TtieHDF Group 
HDF5 file format 



Periodic session workers 



Webserver | 




r -j 


python" 






^Celery 




Message-Broker | 


©SciPy 




vmware 




J IfaRabbitMQ | 


Computation workers 


i 


python 






^ jg Celery 






©SciPy 





Clients 



Figure 7: Scheme of the web application infrastructure 

handle the huge amount of SNP data we developed a hybrid database model using a PostgreSQL 
database and the HDF5 2 file format. Here all SNPs, associated positions and chromosome indices 
as well as all phenotypes and covariates are stored in the HDF5 file. Additional phenotype, sample, 



2 http: / / www.hdfgroup.org/HDF5 / 
3 https:/ /www. djangoproject.com 
4 http:/ /twitter. github.com/bootstrap/ 
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covariate and meta information are stored in the PostgreSQL database. HDF5 files are highly 
optimized to handle huge files and can be accessed fast and easily. As GWA mappings are resource 
consuming all computations are distributed to different computation servers (workers). To schedule 
different tasks smartly we are using a message broker (RabbitMQ 5 ). This broker distributes the 
different tasks to single workers (Figure 7). The backend is well designed to easily extend the 
functionality of easy G WAS using additional novel methods. New species and datasets can be 
integrated within hours, depending on the size of the data. If more computational power is needed, 
new workers can be added dynamically. 



E Tutorials 

In this section we provide various tutorials on how to use easyGWAS. We demonstrate how to 
actually perform a genome-wide association mapping, how to upload own private phenotypes and 
how to share or make them public for collaborators and/or the scientific community. Furthermore, 
we show how to download summary statistics of GWA experiments and published genotype and 
phenotype data. Screenshots are attached to all important steps. 

E.l How to perform a GWAS easily? 

In this tutorial we demonstrate how to easily perform a GWA study using Arabidopsis thaliana 
and already published phenotype FLC (flowering time related phenotype). 

1. If not already done: Create a new easyGWAS account and log in. 

2. Navigate to the easyGWAS wizard 

Menu: GWA-Experiments Create new GWA 

3. First, select a species and a dataset. Here we choose the species Arabidopsis thaliana and the 
dataset AtPolyDB (call method 75, Horton et al.)[l] and click Continue. 



CREATE EXPERIMENTS 

It Overview 
©Tutorial 

EXPERIMENT OVERVIEW 

My temporary history 
£ My experiments 
Cf Shared experiments 
©Public experiments 



1. Species / 2. Phenotypes / 3. Additional Factors / 4. Genotypes / 5. Algorithm / 6. Summary 



1. Select a species and a dataset for your GWA experiment 



Select a species: Arabidopsis thaliana 



Select a dataset: AtPolyDB [call method 75, Horton et al.) 



4. Select a phenotype. Here the user has the choice to select published, private/shared and 
public phenotypes. Additionally the user can upload his own data (see Tutorial 5.2). Here 

5 http:/ /www. rabbit mq.com 
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we select a published phenotype FLC [3]. For this purpose, select the tab 2.1 Select an 
existing published phenotype and type into the input field the name of the phenotype 
(FLC [3]). Auto-completion will help you to select the correct one. Click Continue. 



CREATE EXPERIMENTS 

# Overview 



1, Species / 2. Phenotypes / 3. Additional Factors / 4. Genotypes / 5. Algorithm / 6. Summary 



EXPERIMENT OVERVIEW 

©My temporary history 
■ My experiments 
B Shared experiments 
Public experiments 



EXPERIMENT OVERVIEW 



Arabidopsis IhaSana 
AtPolyDB {call method 75, 
Hortoneraf.) 



£. Select phenotypes for your GWA experiment 

£.1 Select an existing published phenotype 
Select by name Select binary phenotype Select 



Select a phenotype by name among all phenotypes (autocompletlon support): 



FLC 



£.£ Upload new phenotypes 




2.3 Select a private or shared phenotype 




£.4 Select a public phenotype 



5. In the following step you can add additional factors such as principle components or covariates 
(e.g. environmental factors, gender specific characteristics). In this tutorial we do not add 
any additional factors. Click Continue in the tab 3.1 No additional factors. 



CREATE EXPERIMENTS 
T^ 1 Overview 



EXPERIMENT OVERVIEW 

©My temporary history 
■ My experiments 
GS Shared experiments 
Gf Public experiments 



GWA Wizard 



1, Species / 2, Phenotypes / 3. Additional Factors / ^. Genotypes / 5, Algorithm / 6, Summary 
3. Select additional factors such as principle components or covariates 



No additional factors 



Continue if you do not like to add additional factors. 



EXPERIMENT OVERVIEW 

1 . Selected species; 

Arabidopsis tftafara 
AtPoyDB {call method 75, 
Hortonef a!.} 



3.2 Covariate - Principle component analysis (PGA} 
3& Covariate - Upload new covariates 



2. Selected phenotype: 

FLC 



6. Now we have to select genotypic data. You have to choose if you like to use all provided 
SNPs, specific chromosomes or a region of SNPs. For this tutorial we select chromosome 1 
and 5 in the tab 4.2 Select one or several chromosomes for Arabidopsis thaliana by 
checking the boxes. Click Select chromosomes. 
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CREATE EXPERIMENTS 

-fr Overview 



GWA Wizard 

1. Species / 2. Phenotypes / 3. Additional Factors / 4. Genotypes / 5. Algorithm / 6. Summary 



EXPERIMENT OVERVIEW 

©My temporary history 
■ My experiments 
B Shared experiments 
13 Public experiments 



EXPERIMENT OVERVIEW 

1 . Selected species: 
Arabkiopsis ttratiana 
AtPoyDB {call method 75, 
Hortonef a J.) 



2. Selected phenotype: 

FLC 



3. Additional Factors: 

None 



4. To select SNPs for your GWA experiment choose one of trie following options 



4.1 Sel eel all SNPs for Afobidopsis thatiana 

4£ Select one or several chromosomes for Arabidopsis tbaliana 



Select one or several chromosomes from Arabtdcpsts ttiaiiana 

itf Chromosome Chrl 

Chromosome Chr2 

Chromosome ChrS 

Chromosome Chr4 
I^J Chromosome ChrS 



4.3 Select a subset of SNPs for Arabidopsis thaSiana 



7. To perform a GWAS we have to select a method we intend to use. In the algorithms view 
one has options to also apply different transformations or filter to the data. The selection 
of methods and transformations is dependent on the chosen data. Our web application is 
analyzing the data on the fly and is enabling only those options that are applicable for 
your data. Here we keep the default settings using a Linear Regression without any 
transformations. Click Continue. 



CREATE EXPERIMENTS 

tV Overview 

+ Create new GWA 

©Tutorial 



GWA Wizard 



EXPERIMENT OVERVIEW 

©My temporary history 
■ My experiments 
E£ Shared experiments 
El Public axper 



EXPERIMENT OVERVIEW 

1 . Selected species: 

A/abkiopsis tftaKana 
AtPolyDB (call method 75, 
Hortonef af.) 



2. Selected phenotype: 
FLC 



3, Additional Factors: 

None 



4. Selected SNPs: 

Chromosome ['ChrS', 'ChM'] 



1. Species / 2. Phenotypes / 3. Additional Factors / 4. Genotypes / 5. Algorithm / 6. Summary 
5. Select an algorithm and additional parameters for your GWA experiment 



d phenotype. More information in the FAQ. 



S.f Algorithm 

Choose an algorithm from the list 
Algorithm Description 

Linear regression is used to test if a arbitrary phenotype is associated with a sircle sircle nucleotide polymorphism (SNP). Each 
marker/SNP \o to 3 toe! ircfivicf tally. AcSditior al faclc-'s/cova-'iates car be acScSod to the model slcI* as ervi'ormertal factors, gender 
specific cova-'iales p- ii ciplc conporerts PCs:. 



5.2 Filter 

Choose threshold for minimum allele frequency (mAF) © 



5.3 Standardization 

Choose a standardization method for your genotypes 
Choose a standardization method for your phenotypes 



no standardization 



8. In the last step you can check all your inputs again and do necessary adjustments. If ev- 
erything is correct, you can submit your experiment to the computation servers. For this 
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purpose, simply click Submit Experiment. 

9. Finally your experiment is submitted. All computations are running in the background. 
The current view gets refreshed every 3 seconds. In the meanwhile you could submit new 
experiments or browse the data. Nevertheless, this experiment is finished in around 60 seconds 
and you will get automatically redirect to the result view. 



CREATE EXPERIMENTS 

-pY Overview 
OTutorial 

EXPERIMENT OVERVIEW 
©My temporary history 
■ My experiments 
CS Shared experiments 
S Pub lie experiments 



GWA-Expsrirnent s ub rnitted 



Ex peri merit sub m itted I 






If the computations are ready you will get notified via a push notification or you directly got rediren 
leave the web -application you get notified via e-mail. 


;ted to the res 


nits-page, If you 


Your gwa-experiment is saved in your temporary history. 






Unique Experiment-ID: db0770fc-f06C"4d78-967a^JS47bb57c8ae 






Experiment Is running 15.0 seconds 




"J 



E.2 How to upload an own phenotype and perform a GWAS on it? 

Here we show how to easily upload new phenotypic data and how to perform a GWAS with it. 

1. Navigate to the easy GWAS wizard 

Menu: GWA-Experiments — >> Create new GWA 

2. First, select a species and a dataset. Here we choose the species Arabidopsis thaliana and the 
dataset AtPolyDB (call method 75, Horton et al.)[l] and click Continue. 



CREATE EXPERIMENTS 

it Overview 



EXPERIMENT OVERVIEW 

O My temporary history 
£ My experiments 
Cf Shared experiments 
©Public experiments 



GWA Wizard 



1. Species / 2. Phenotypes / 3. Additional Factors / 4. Genotypes / 5. Algorithm / 6. Summary 
1. Select a species and a dataset for your GWA experiment 



Select a species: Arabidopsis thaliana 



Select a dataset: AtPolyDB [call method 75, Horton et al.) 



3. Now navigate to 2.2 Upload new phenotypic data and download the linked demo file. 
Click on Choose File and upload the demo file. Click on Continue. 
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GWA Wizard 

1. Species / 2, P he not/ pes / 3= Additional Factors / 4. Genotypes / 5, Algorithm / 6. Summary 

2. Select phenotypes for your GWA experiment 

2A Select -an existing published phonolype 

2.2 Upload new phenotypic data 

Upload a new phenoly pe. To check the file-format or get additional informations please see the FAG. 

To b rows e all availab le access ions click here: r^^^r^^™^^ 
Here you can download an exarnp le file. F^^^r^?n^T*™T^*^l 

Upload file with new phenotypcr 

I Choose File I no file selected 

2.3 Select a private or shored phenotypc 
2A Select a public phenotypc 



4. Finally proceed like in Tutorial 5.1. You may try different algorithms and transformations. 
E.3 How to store, share or publish your performed GWAS? 

To keep interesting findings and experiments users can save their results permanently using the 
saving functionality in My temporary history. Here you can rename your experiment and save 
it in your experiment history My experiments. 



Temporary GWA-Experiment history Saved GWA-Experiments Shared GWA-Experiments 



Tem porary experi ments are avai lab le f o r 48h I To store them permanently p lease save the experi ment usi ng the save b utto n . 

Experiment Experiment id Date Species SNPs Phenotypcs Algorithm Running Operations 

name 



Experiment 736ae04S-daS4-4aDS- Nov. 11. A/abkfopsis chromosome FLG Linear Done X S & 

a6fO-400563e7dS31 2012. 06:32 thafiana Regression &5 ■ 

PM 

Experiment fbf6a052-efaf-4725-a7Sd- Nov. 11. Afabtdopsis chromosome TLC Linear Done ^^^^^^E 

1 0d9b4cf66c91 2012,06:34 tftsfana Regression S3 fi 

PM 



To simplify data exchange between colleagues and the scientific community one has the possibility 
to share and publish saved experiments. For this purpose, open the category My experiments 
and click on either Share Experiment or Publish experiment. Note that if you like to publish 
an experiment which was performed on private phenotypes and/or covariates, one has to publish 
all dependent private data. Please provide meaningful and useful names. 
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Make your experiment public to the scientific community 

WATCH OUTI Published experiments cannot be deleted anymore. Please provide a meaningful name! 



Private phenotypes and private covariates are linked to this experiment. 

If you like to publish this experiment you also have to publish all associated phenotypes and covariates. 
Please provide meaningful names! 



Experiment name? 
Experiment idr 
Covariate (id; AT 0)r 
Covariate (id; AT_C_1): 
Phenotype (id; AT P 264}: 



Experiment 1 



1£Be&e0b-f 1f7-4da6-9c24-1 1fl37'6536d9c 



TestCovariate 



Provide meaningful names 



TestC ov ariate^ 



TestPhenotype LD 




E.4 How to download summary statistics of your results? 

For each experiment summary statistics can be downloaded using the download assistant at the 
GWAS result page. Click on Download Summary Statistics and choose one of your preferred 
formats. Right now there are choices for comma-separated files (CSV) and hierarchical data format 
5 (HDF5 2 ). The summary statistic files contain p- values for each loci and chromosome. The HDF5 
file has additional information on which samples were used. 



Manhattan Plots QQ-Plots Phenotype Explorer SNP Annotations Detailed experiment summary 



GWA-Results for experiment with id: f965c71b -6743-461 1-B7fd-b22e4547adf 9 



GWA result options 



Manhattan -plot for chromosome 1 

■ -log 10(p -value] ■ Bonferroni threshold [0.05] 

25 
§10 

S 5 



Download Summary Statistics 

As CSV summary -statistics file 
As HDF5 summary -statistics file 

File format FAQ 









1 








































■ 


















* mmrm *i 









2000000 4000000 6000000 flOOOODO 10000000 12000000 14000000 16000000 18000000 

chromosomal position [bp] 



E.5 How to download publicly available data? 

To download whole genotype and phenotype data you can use the Download Center in the main 
menu. Here you can download all available data for each species and dataset in various formats, 
supporting PLINK[5], CSV and HDF5 format. 
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